Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

[Experimental][StarCode] KV Cache Injection #2080

Closed
wants to merge 6 commits into from
Closed

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Feb 15, 2024

Feature Description

The results of my experimentation with the tiny_starcoder model.

Findings:

  • the original KV cache is being added not as separate arrays: past_key_values.{attn_block_id}.values and past_key_values.{attn_block_id}.keys, but as a join array of keys and values. Did not get to look into breaking those two down, but by analyzing the onnx graph I do not see why we could not do it
  • the causal mask for this model has different dimensions than what we usually assume. This could be fixed by adding a node after the causal_mask input, that applies the appropriate permutation to the input to patch this.

This is an experimental branch, for which I will, for now, stop the development due to other priorities. To revisit in the future.

@jeanniefinks
Copy link
Member

Per the main README announcement, SparseML is being deprecated by June 2, 2025. Closing the PR as work has been suspended; thank you for the inputs and support!

@jeanniefinks jeanniefinks deleted the experimentation branch June 2, 2025 19:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants